1 

Using Composite Likelihood in Loss Tomography 

Weiping Zhu member, IEEE, 



(N 

o 

(N 
O 

Q 



c3 

> 

(N 

\q 

O 

(N 



X 



Abstract — Full likelihood has been widely used in loss tomog- 
raphy because most believe it can produce accurate estimates 
although the full likelihood estimators proposed so far are 
complex in structure and expensive in execution. We in this 
paper advocate a different likelihood called composite likelihood 
to replace the full likelihood in loss tomography for simplicity and 
accuracy. Using the proposed likelihood, we propose a number 
of explicit estimators with statistical analysis. The analysis shows 
all of the explicit estimators perform almost as good as the 
full likelihood one in a number of aspects, including asymptotic 
variance, computational complexity, and robustness. Although 
the discussion is restricted to the tree topology, the methodology 
proposed here is also applicable to a network of a general 
topology. 

Index Terms — Composite Likelihood, Loss tomography, 
Model, Maximum Likelihood, Explicit Estimator. 

I. Introduction 

Loss tomography has received considerable attention in the 
last 10 years since it provides a new methodology to measure 
the internal characteristics of a large network. In contrast to 
direct measurement, loss tomography uses statistical inference 
to estimate the link-level loss rates of a network from end-to- 
end observations. Because of this, a number of estimators have 
been proposed during this period. Some of the proposed esti- 
mators have been proved to be the maximum likelihood ones 
Q, H2, E, HI, 0, @, 0, 10, 0, HO). Unfortunately, none 
of the maximum likelihood estimators is presented in a closed 
form and only a few of them are presented with statistical 
properties, e.g. unbiasedness, variance and robustness. Without 
the properties, it is hard to compare and evaluate the proposed 
estimators. To assist the development of loss tomography, an 
analysis of a frequently used estimator is presented here that 
unveils the statistical logic used by the estimator and sheds 
light on developing simple and accurate estimators. 

The estimator proposed in 0] is one of the widely used 
estimators in loss tomography that is designated to the tree 
topology and has a likelihood equation in the form of a single 
variable polynomial. The likelihood equation connects the pass 
rate of a path, from the root to an internal node, to the 
pass rates of the pathes from the root to the children of the 
internal node, where the former is the variable of the likelihood 
equation and the coefficients are from the data collected from 
end-to-end measurement. Our analysis shows such a likelihood 
equation that actually considers all of the correlations among 
the descendants in terms of observations regardless of whether 
the correlations are presented in the dataset collected from 
an experiment. Because of this, the estimator and alikes are 
called full likelihood estimators in the literature that has been 
widely used in loss tomography since most believe using 



all correlation can lead to accurate estimates. However, this 
intuitive may not be true if a limited number of measurements 
are carried out since the usefulness of a correlation depends 
on whether the data collected from an experiment fits to the 
model producing the data and whether a correlation considered 
is completely or partially covered by another ifTTI . In contrast, 
the complexity of a full likelihood estimator is directly related 
to the number of the correlations considered, so is the degree 
of the polynomial. If the number of descendants is larger 
than 6, a high degree polynomial is needed to describe the 
correlations in terms of the observations among the descen- 
dants. However, there is no an analytical solution for a fifth 
or higher degree polynomial equation according to Galois 
theory. Using an iterative procedure, e.g. the EM, to solve 
a high degree polynomial certainly affects the scalability of 
an estimator. Because of this, finding a simple and accurate 
estimator becomes the key issue of loss tomography that has 
attracted a consistent attention in the last 10 years. Despite the 
effort, there has been little progress in this direction. The few 
explicit estimators, e.g. lfl2l . proposed during this period 
either perform worse than those using the full likelihood, e.g. 
(8), or do not provide enough statistical analysis to support 
themselves, e.g. fl2l . This fruitless situation gives rise of two 
questions: one of them is whether there is an accurate and 
explicit estimator, while the other is whether the full likelihood 
estimator is necessary for all data sets. For the former, most 
believe that accuracy and simplicity cannot coexist whereas 
there has been little discussion yet for the latter. 

To address the questions raised above, we thoroughly 
analyze the full likelihood estimator presented in [1] and 
carefully examine the relationship between data collected from 
an experiment and the model used to describe the data. All 
of those aim at finding the rational behind of the use of a 
high degree polynomial as the likelihood equation and look for 
inspiration that can reduce the degree without loss precision. 
As previously stated, the rational turns out to be the use 
of the full likelihood in estimation. With the finding, the 
correspondences between data and models are identified from 
the full likelihood estimator. Our focus is then switched to seek 
an alternative to take advantage of the correspondences. Fortu- 
nately, composite likelihood is identified as the replacement of 
the full likelihood that leads to a number of explicit estimators. 
The explicit estimators in general perform as good as the full 
likelihood one in a number of aspect, including variance. In 
addition, the estimators are not only provide support to real- 
time applications but also makes it possible to adopt the policy 
of selecting estimators for data in estimation. The details of 
the contribution are three-fold: 
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• the obstacle that blocks the emergence of simple and 
accurate estimators is identified by analyzing a widely 
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used full likelihood estimator; 

• composite likelihood is proposed to replace full likeli- 
hood and subsequently a number of explicit estimators 
are proposed. This enable us to select models for data to 
have flexibility and accuracy in estimation; and 

> the statistical properties of the explicit estimators are pre- 
sented, including unbiasedness, consistency and unique- 
ness. More importantly, the asymptotic variance of an 
explicit estimator is presented that is almost the same as 
that obtained by a full likelihood estimator. 

The rest of the paper is organized as follows. In section 2, 
we introduce the notations frequently used in the paper and 
the composite likelihood mentioned above. We then present 
the analysis of the estimator reported in |fl] in section 3. The 
analysis throws light on the correspondence between data and 
models in estimation. We then in section 4 divide data and 
models into a number of groups according to the correlations 
involved in the observations of the descendants connected to 
the path being estimated. After that, we pair the data with 
the models that have the same degree of correlations and 
obtain a number of composite likelihood equations. Then, a 
statistic analysis of the proposed estimators is presented in 
section 5, including the statistic properties of the estimators. 
The estimators are further evaluated in section 6 by simulations 
that confirm the contribution listed above. In addition, the 
robustness of the proposed estimators are discussed in the 
section that includes the ability to deal with data missing. The 
last section is devoted to concluding remark. 

II. Notation and Composite Likelihood 

The symbols that are frequently used in this paper are 
introduced in this section, where the others will be defined 
when firstly used. 

A. Notation 

Let T = (V, E) be a multicast tree used to dispatch probes 
from a source to a number of receivers. V = {vq,V\, ■■■v m } 
is a set of nodes and E — {e\, ...,e m } is a set of directed 
links that connect the nodes in V to form the multicast tree, 
where vq is the root of the multicast tree. Each node except 
for vo has a parent, let VfU) be the parent of Vi and is 
used to connect VfU) to Vi. If fi(i) is used to denote the 
ancestor that is I hops away from node i along the path to 
the root, we then have a(i) = {f(i) > f2(i),">fk(i)} be the 
ancestors of node i. In addition, we use R to represent all 
receivers attached to the leaf nodes and use R(i) to represent 
the receivers attached to T(i) that is a multicast subtree with 
as its root link, di is used to represent the descendants attached 
to node i that is a nonempty set, where | di | is used to denote 
the number of elements in di. Note that a descendant of a node 
is either a multicast subtree or a link connecting a leaf node. 
If n probes are sent from vq to R, each of them gives rise of 
an independent realization X^', i = 1, n, of the pass(loss) 
process X, X l k = 1, k £ V if probe i reaches v^; otherwise 
XI = 0. The sample Y = (Xj ) % ^^'"' n ^ comprises the data 
set for estimation that can be divided into sections according 



to R(k), where Yk denotes the part of the sample observed by 
R(k). 

If Xk, the pass process of link k, is an independent identical 
distributed (i.i.d.) process and follows a Bernoulli distribution, 
a set of sufficient statistics have been identified in [13|, one 
for a node that is called the confirmed arrivals at the node and 
defined as follows: 

n 

»*(!)= E V VJ> (!) 

where denotes the observation of receiver j,j £ R(k), 
for probe i. = 1 if the probe is observed by receiver j; 
otherwise, yj = 0. 

To understand the principle used by a full likelihood 
estimator, we need to divide Ufe(l) into a number of groups 
according to the number of descendants observing them. The 
groups can be divided into two classes: single-observations 
and co-observations. A single-observation as named refers to 
the number of probes observed by the receivers attached to a 
descendant of node k and there are \dk\ single-observations, 
one for a descendant. In contrast, a co-observation refers to 
the number of probes simultaneously observed by a number 
of receivers that are attached to different descendants, called 
co-observers later. Note that an observer here represents all 
receivers attached to a descendant, the observer observes a 
probe if at least one of the receivers observes the probe. 
The co-observations can be further divided into groups ac- 
cording to the number of co-observers. For node k, there 
are 2' dfc l — (dk + 1) groups of co-observations. To represent 
the single-observations and co-observations, a er-algebra, S, 
is created over dj., where = S \ is for the single- 
observations and co-observations. The elements of £& can 
be divided into a number of groups according to the number 
of co-observers. For x, x £ Efc, #(x) is used to denote the 
number of co-observers in x. 

Let 7* be the observation of descendent T{j) for probe i, 
which is equal to 

ii= V yi 

keRU) 

Then, we have 

n 

2—1 j^X 

be the total number of probes observed by the member(s) of x 
in an experiment. If = 1, Ik(x) is the single observation 
of x, otherwise, it is the co-observation of x. Then, we have 

n fe (l) = ^(-ir 1 £ I k (x), (2) 
»=1 #(*)=» 

Given (01, we are able to decompose a full likelihood equation, 
into a number of components. 
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B. Composite Likelihood 

Composite likelihood dates back at least to the pseudo- 
likelihood proposed by Besag lfl4l for the applications that 
have large correlated data sets and highly structured statistical 
model. Because of the complexity of the applications, it is 
hard to explicitly describe the correlation embedded in obser- 
vations. Even if the correlation is describable, the computation 
complexity often makes inference infeasible in practice [15|. 
To overcome the difficulty, composite likelihood, instead of 
full likelihood, is proposed to handle those applications that 
only consider a number of the correlations within a given data 
set. The composite likelihood defined in lfl6l is as follows: 

Definition 1: 



L c (9;y)=l[f(yeC l ;< 



(3) 



where f(y E C i; 0) = f({ Vj E Y : Vj E Ci};6), with 
y = (yi, y n ), is a parametric statistical model, I C. N, 
and {wi,i E 1} is a set of suitable weights. The associate 
composite loglikelihood is l c (6;y) = log L c (9,y). 

As stated, composite likelihoods can be used in parametric 
estimation. Using the maximum likelihood principle on a 
composite likelihood function as 0, we have V log L c (9,y), 
the composite likelihood equation. Solving the composite like- 
lihood equation, the maximum composite likelihood estimate 
is obtained, which under the usual regularity condition is 
unbiased and the associated maximum composite likelihood 
estimator is consistent and asymptotically normally distributed 
[17|. Because of the theoretical properties, composite likeli- 
hood has drawn considerable attention in recent years for the 
applications having complicated correlations, including spatial 
statistics [18], multivariate survival analysis generalized linear 
mixed models, and so on ifTHl . Unfortunately, there has been 
little work using composite likelihood in network tomography 
although network tomography is one of the applications that 
have complex correlations. As far as we know, only [19] 
proposes an estimator on the basis of pseudo-likelihood for 
delay tomography and SD traffic matrix tomography. 

III. Full Likelihood Estimation 

As stated, there has been a lack of discussion about the 
connection between data and model in a loss rate estimator. To 
fill the gap, we examine a widely used full likelihood estimator 
in this section. 

A. Full Likelihood and its Components 

For the tree topology, the widely used full likelihood 
estimator is proposed in 0, 0, which is a polynomial 

as: 



lk_ 



n 



A k 



(4) 



where A k is the pass rate of the path connecting vq to v k and 
7k is the pass rate of a special subtree that consists of the 
path from v to v k and the subtrees rooted at v k . Despite a 
number of properties are presented in [1|, there is a lack of 



the necessary and sufficient condition for the correctness of the 
estimates obtained by |@J. Without it, an incorrect estimate can 
be mistakenly considered a correct one. To remedy this, we 
present the following theorem that unveils the correspondence 
between data and models in the likelihood equation. 

Theorem 1: The estimate obtained from (0]i converges to 
the true parameter if and only if 

1) the true losses occurred on a link is as assumed in 
0, i.e. according to the Bernoulli distribution and loss 
processes of the links are i.i.d.; and 

2) the observation of R(k) satisfies 



Vx,x E S fe , I k (x) ^ 0. 



(5) 



Proof: The first condition states such a fact that only 
if the assumed model is the true model generating the data, 
the estimates obtained by (0| converges to the true parameter. 
Based on the conditions, we can write a likelihood function 

for A k as 



L(A k 

where l-/3 fc = Uqed.i 1 



A n*(l) (1 

A k 



A fe /3 fe )"-"*« 



(6) 



) is the loss rate of the subtree 



rooted at v k . Turning the above into log L(A k ), differentiating 
it and letting the derivative be 0, we have an equation as (0). 

n k (l) 

Using the empirical probability j k = and 7,- = 

n 

^ii^ji to replace j k and 7, from (0), and then expanding 
both sides of the equation, we can prove [U in three steps. 
1) If we use to replace n k (l) from the left hand side 
(LHS) of ©, the LHS becomes: 

1 _ %_ x _ n k {l) 

A k n ■ A k 



#(*)= 



(7) 



2) If we expand the product term located on the right hand 
side (RHS) of ©, we have: 

IK 1 -|:) = 1 -E(- i r 1 E 

3) Deducting 1 from both © and dHJ and then multiplying 
the results by A k , (0} turns to 



t=2 #(x)=j 



»=2 #(x)=j k 



(9) 



It is clear there is a 1-to-l correspondence between 
the terms across the equal sign. The correspondence 
becomes obvious if we rewrite (|9j as 



\d k \ 



EM) 1 E ( 



#(*)= 



A 



i-l 



0. (10) 
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Clearly, if one of the Ik(x) = 0, (fT0T > could not hold unless 
the corresponding Yljex 7? = 0- Then, (O cannot hold either. 
The minus sign in the first summation of ( fTOb ensures that 
each probe is considered once and once only in estimation. ■ 

B. Insight of Full Likelihood Estimator 

Three key points can be drawn from (TTOt in regards to the 
full likelihood: 

1) the full likelihood relies on both single-observations and 
co-observations to estimate Ak', 

2) the full likelihood requests the presentence of all single- 
and co-observations although the co-observations are 
positively correlated; and 

3) the full likelihood consists of various likelihoods, each 
of them represents the correspondence between a group 
of co-observations and the model generating the co- 
observations. 

The first point indicates the correspondence between single- 
observations and co-observations, where single-observations 
and the pass rate, Ak, are used to model the co-observations. 

This is because although Yljex 7j anc ^ ^ ~ axt unbiased 

estimates for Af^ f3 x and Akf3 x , respectively, where j3 x is the 
product of the pass rates of the subtrees with their root links 
in x, the former has a smaller variance than the latter [20 . 
Then, the former is used to model the latter in the likelihood 
equation. If a single observation is not intersected with others, 
the single-observation is no longer needed in estimation since 
there is no corresponded co-observation. 

The second point unveils the accuracy of an estimate 
obtained by the estimator rests on the data set collected 
from an experiment that requires a complete set of the co- 
observations defined in Then, a \d k \ — 1 degree polyno- 
mial having \dk\ terms for the degrees from to \d k \ — 1 
becomes inevitable, where each type of the co-observations is 
considered a dimension and the estimator aims to find a fit 
within the \dk\ — 1 dimensions. Instead of dropping some of 
the correlated observations from estimation, the estimator uses 
the minus sign of the equation to overcome the possibility of 
over fitting and ensure a smaller variance for its estimates. 

The third point states that each term in the second summa- 
tion of ( [Tol l can be used as an estimator that only considers 
matching one type of co-observations to the models generating 
them, i.e. measures the fitness of an Ak in the dimension. 
Alternatively, we can use a number of the terms in (flOl to 
form an m-dimension space and find an Ak fitting in the space. 

IV. Composite Likelihood Estimator 

The insights reported in the last section inspire us to search 
for a different likelihood to construct the likelihood functions. 
Composite likelihood is then selected as an alternative that 
allows us to selectively use estimators for data instead of 
using one for all. To achieve this, a number of theorems and 
a corollary are presented in this section. 

Theorem 2: The maximum composite likelihood equation 
for Ak that only considers the co-observations of two ob- 



servers in Yu is as follows: 



#0)=2 



h(x) 



(ID 



Proof: Let 

C i = {x\#{x)=2,xeY, k } 

be the events obtained from Yk, i.e. the events observed 
simultaneously by two descendants of v k . On the basis of (0), 
we have the pairwise composite likelihood function as follows: 

| rf fc | — 1 \die\ 

L c (2,A k ;y) = ]J [J f(yi, yj ;A k ) (12) 

where {i,j} G and Wi = 1 to have the same weight for 
each object. The parameter 2 in L c (-) indicates the likelihood 
function is for pairwise likelihood. Let x — £ dk, 

we then have a likelihood object as 

f{vm.i)>vm'> 9 ) = A l xil) i l - A k Px) n ' nA1) 

where n x (l) = n^(l) + rij(l) — Ik(x) and j3 x is the combined 
pass rate of subtrees T(z) and T(j), i,j 6 dk, and 1 — (3 X = 



n 



qEx 



1 



Is. 

A k ' 



Then, 



L c (2,A k ;y) 



A k Px) 



n-n x (l) 



(13) 



#(x)=2 



Further, getting the derivative of logL c (2, A k ; y), we have 

7fcO) TT(i_ll 
A k ' 



X7\ogL c {2,A k ;y)= £ [l- 



#(z)=2 



-4, 



q£x 



(14) 



where 7^ (x) denotes the pass rate from vo via Vk to the virtual 
subtree combined by the subtrees inclosed in x. Let (fl4l be 
0, we have the pairwise likelihood equation as (fTTT >. ■ 

It is clear that (fTTT i is equal to set the first term of (Hot to 0. 
Using ( fT3l ) as the likelihood function, all confirmed arrivals at 
Vk are taken into account in estimation. The difference between 
(HJi and ( fTTl i is the likelihood used in estimation. 

If setting each term of ( fTOb to 0, we can have a set of 
composite likelihood equations, each of them is focused on 
one type of co-observations. The following corollary expresses 
the formats of the likelihood functions. 

Corollary 1: Let L c (\, A k ;y) = 1, a set of composite 
likelihood functions are obtained, each considers one type of 
the I cjCa^ I — 1 co-observations, from pair-wise to |cifc|-wise. The 
likelihood functions can be expressed in a recursive manner. 
Let L c (i\ Ak]y) be the i — wise, 2 < i < \d k \, composite 
likelihood function that has the following form 



L c (i;A k ;y) 



n#(x) 



=,:<* (1) (1 



AA 



,(1) 



n 



#(x)=i A k 



Y\T=\L c { r ,A k ;y) 
' {l) {l-A k p x 



\n — n x (l) 



n 



#(y)=i 
yeSs 



A k (3 v y 



(15) 



5 



where n x (l) is the confirmed arrivals at a virtual node x, 
obtained from Y x , and f} x is the combined pass rate of the 
subtrees composed by the members of x. 

Proof: It is clear the nominator on the RHS of (TT3T > is 
the likelihood function considering all co-observations from 
pairwise to i-wise and the denominator is the likelihood 
functions from pair- wise to (i — l)-wise. The corollary then 
follows. ■ 
There are \dk\ — 1 composite likelihood functions in (l5\ , 
each measures the fitness between a type of co-observations 
and the models generating the co-observations. Since the co- 
observations considered by each of the composite likelihoods 
has the same number of co-observers, the likelihood equation 
obtained is a polynomial that only has two terms, one is an 
i — 1 degree term of the estimate and the other is a constant. 
Then, we have the following theorem. 

Theorem 3: Each of the composite likelihood equations 
obtained from (TT3T > is an explicit estimator of A k . Let Ak(i) 
be the explicit estimator for i-wise co-observations that returns 
an estimate of We then have 



A k (i) = ( 



Z)#(s)=i UjGx H 



1 



S#(a)=i 



h(x) 



i - 1 



iG{2,..,|d fc |}. (16) 



Proof: Firstly, writing (fl3T l into a log-likelihood, we 
then differentiate the log-likelihood and let the derivative be 
0. Finally, the likelihood equation for the i-wise likelihood 
function is presented as (fTSI i. ■ 
In the rest of the paper, we use Ak(i) to refer the i — wise 
estimator and Ak(i) to refer the estimate obtained by Ak(i), 
where the i is called the index of the estimator. 



Lemma 1: Let 

lm k (x) = 13 x G E fc , #{x) > 2, 

Jfc [x ) 



lmk(x) is an unbiased estimate of A^^ 1 . 



and let zj — ^ e me sam pl e mean of the pass rate of 



Proof: Let fife(l) be the number of probes reaching v k 
let zj = 
descendant j, 

'Yijex 7j ' 



n 



^(i) 



Si=l Aj£ X Vj 



(M^wwu Mil 



7 ■ 



= E 



l\ E f \\, ' 'I 



n 



jEx "3 



V n 

E((Mii)*-)-')£(n| 
( 



E A 



n 



j£x 



= A 



#(*)-! 



(18) 



A. Weighted Mean 

Let Zj be the pass rate of descendant j. If knowing Zj, (fTol l 
can be written as 



A^y- 1 = 



n 



j£x Z 3 



A 



(-1 



(17) 



where A l x is equal to 



#(x)=i ^ ' 

Ujex 7j 



Then, ^4fc(i) 1 is a 



weighted mean of A*" -1 , a; G Sfc/\#(a:) = i, and Afe(i) is a 
weighted power mean. The weight assigned to A^T 1 depends 
on the pass rates of the subtrees in x. The A x that has a high 
weight contributes more to the mean than those having low 
weights. Since Zj is not observable directly, 7 3 and I x (x) are 
used in (IT6b to achieve the same effect as the weighted power 
mean. Because of this, the estimators share a similar statistical 
properties as the weighted power mean. 

V. Properties of Composite Likelihood Estimator 

This section is devoted to evaluate the statistical properties 
of the |effe| — 1 estimators presented in (fTol l. include unbi- 
asedness, consistency, uniqueness, and the robustness. The 
following theorems or lemmas are for the properties. Firstly, 
we have 



Lemma[T] shows that although lrrik{x) only considers a part 
of the observations obtained from an experiment, it can be 
an unbiased estimator of A^^" 1 . To simplify the following 
discussion, we call Imk(-) local estimator. Accordingly, the 
estimate obtained by lirik(-) is called local estimate and 
denoted by imfe(-). Based on lemma Q]we have 

Theorem 4: (lm,k(x)) , x G Sfe is an unbiased esti- 
mate of Ak- 

Proof: This can be obtained directly for lemma Q] ■ 
Since Ak(\dk\) is the \d k \ — 1-th root of Irrik(dk), we can 
conclude that Ak(\dk\) is an unbiased estimator. Apart from 
Afe(|<ifc|), we are not able to prove other Ak(i)& are unbiased 
estimators since we are not able to prove 



E 



E 



n 



, s . J2 ze£ fe n 
#( X % #(*)=* 



3 Ex Z 3 



if n < oo. Nevertheless, we have 

Theorem 5: The estimators presented in (fTol l are asymp- 
totic unbiased estimators. 

Proof: According to lemma [T] 



A k {i) = A k 



n 



3 ex Z 3 



(19) 
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Then, 

lim E(A k (i)) 



lim E(A k )E(( V 

n — van V V * ■ 



n 



jex *3 



J2 lest n 7 



#(*)=» 



#(*)= 



^fc^f lim ( V 



n 



= AkE (( E v °^fr 



(20) 



The theorem follows. ■ 
Further, we have the following lemma before proving j4fe(i) 
is a consistent estimate. 

Lemma 2: 1) lm k (x) is a consistent estimate of 

A* and 
2) lm k (x) #< a =)- 1 is a consistent estimate of Afc. 
Proof: 

1) Lemma Q] shows that lm k (x) is equivalent to the first 
moment of 1 . Then, according to the law of large 
number, lm k (x) — > At ' 1 . 

2) From the above and the continuity of lm k (x) on the 
values of , j G x and I k (x) /n generated as A% ranges 
over its support set, the result follows. 

■ 

Then, we have 

Theorem 6: A k {i) is a consistent estimate of A k . 
Proof: As stated, Ak(i) is an estimator using the weighted 
power mean in estimation. Let Sk(i) = {x\x 6 £fe f\ #(x) = 
i}, we have 

min Irrikix) 7 ^ < A k (i) < max Im^x) 1 ^ 1 . (21) 

xeSk(i) ' ' xes k (i) 

Then, from lemma [2] the theorem follows. ■ 
Further, we have the following for uniqueness. 
Theorem 7: If 

h(x) 



E ITw< E 



#(*)=* 



there is only one solution in (0, 1) from 2 < i < \d k \. 

Proof: This is obvious. ■ 
As HI, El using a linear function to model the loss rate of a 
link, the Delta method can be used to prove that the asymptotic 
variance of A k (i) is the same as that of the full likelihood 
estimate, to the first order of the maximum loss rate of the link 
in E. Firstly, a number of symbols introduced in [T] and [8] 
for theorem 5 and lemma 1, respectively, are presented, which 
will be used in the proof of a lemma and a theorem later in 
this section, i.e. lemma[3]and theorem|8] Among the symbols, 
a k , k G E is for the pass rate of link k, a k = 1 — a k is for the 
loss rate of link k, and ||a|| = max kG E \a k \. In addition, we 
have 7 X for the pass rate of a special tree consisting of the path 
from node to v k and the subtrees rooted at node k and with 



their root links in x. In addition, let t x — Yljex a j> x £ ^fc- 
Then, we have the following lemma that generalizes Lemma 
1 in (8): 

Lemma 3: 1) Cov{^ x ,^ y ) = j x (l - ^ y ) if y C x and 
x,y G S fc ; otherwise j xWy - j x j y if x,y G £ fe and 
y £ x or x y. 
2) Cov(-f x ,j y ) = Sk + Oy + 0(||a|| 2 ) if y C x and x, y G 
S fe , Cov(~j x ,^ y ) = s fc + t xAy + 0(||a|| 2 ) if x,y G £ fe 
and y x or x y. 
Proof: x and y here can be considered virtual nodes if 
#(x) or #(y) is larger than 1, where j x = Ik ^ x ' , x G Sfc 
and j y — , y G Since the loss process is a 0-1 
distribution, we have 



[ n p ^ 



l\X k 



j€xVy 

[n 

jexVy 

Y\j£x\/y 7? 

^#(xVt/)-l 



p(y J = iA^ fe = i 



= 1) 



p(^ fe = 1) 
ppffc = 1) 

(22) 



if x J# y and y ^ x. Then, 

Coviwy) = E(^ x -y y ) - E{ lx )E( ly ) 

Yljexvy^j Tljex^j l~lie y H 



#(xVy)-l 



A 

Uiex i] n 



A #(x)-i A #{y) 



7i 



.4 



#(x)+#(y)-2 



1 



ft 



7 



*7i/( 



1 



.4 



#(a:Ay) + l 



- 1 



#(xAy) + l 



- 1 



(23) 



= JxVy ~ Ixly 

If x, y G Efe and y C x, we have 

E(j x %) = E(%) = % 

and then 

Cou(7 x ,7j,) = 7 X (1 - j y ) 

|2]i According to Lemma 1 in |8|, where A k 

2 ) and 7fc = 1 - s k + 0(|| a || 2 ), 7, 

Sfc — *x + 0(||a|| 2 ). Using the values in (|23l , we 



o 



(24) 
(25) 

i-* fc + o(|| 



lx = 1 

have 



Cou(7 x ,7j,) 



s k +t xAy + 0(\\a\\) 2 



(26) 



for x, y G S fc and y ^ x or x ^ y, where s k = J^jea 
For x, y G Sfe and y C x A #(y) = 1. Then, 



jea(fc) a J- 



Cow(7 x ,7 y ) =7^.(1-7^) 
= (1 - s k -t x + 0(\\a\\) 2 )(l (1 - 8y + 0(||a||) 2 )) 
= (1 - s k - t x + 0(||a||) 2 )( Sfc + a y + 0(\\a\\) 2 ) 

= s k + a y + 0(||a||) 2 . (27) 

■ 

Given lemma [3] we have the following theorem for the 
asymptotic variance of A k (i). 
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Theorem 8: As n — > oo, n^{A k (i) - A k ) has an 
asymptotically Gaussian distribution of mean and variance 
v E (A k (i)), where v E {A k {i)) = s k + 0(\\ a || 2 ). 

Proof: Using the central limit theorem, we can prove 
n^(A k (i) — A k ) is an asymptotically Gaussian distribution 
of mean and variance v E as n — > oo. We here focus on 
prove the variance. Let 

D{i) = {x\#(x) = i,x eS fe }. 

that has 

W)\ = 

elements. By the Delta method, n^(A k (i) — A k ) converges 
in distribution to a zero mean Gaussian random variable with 
variance 

vjf (A h (i)) = vAfc(<) • C E ■ vA k (i) T 
where vA k (i) is a derivative vector obtained as follows: 



{t xAy : x,y G D(i)} is a symmetric matrix, where the 
diagonal entries are equal to t x ,x G D(i) since < lAx = t x . 
The sum of each column in {t xAy is equal to 



E 



141 
i - 



1 



= ({^ 



(1)}) 



C E is a square covariance matrix in the form of 



C E = ( i Cov (^^y)} { Cov (7x,7j)} 
\{Cov('y x ,'y j )} {Cou(7 3 -,7j/)} 



The bottom left matrix has |Z)(1) | columns, there are #(x) 
entries in a column that equal to otj ,j G x, respectively. The 
top right matrix is the transpose of the bottom left one that has 
( i-i 1 ) ena "i es in a column that all equal to <x,; the bottom 
right matrix is a diagonal matrix that has its diagonal entries 
equaling to ctj,j G Let M denote the the matrix in 

QUI ). We have VA k {i) ■ M ■ V 'A k (i) T = Then, the theorem 
follows. ■ 
Compared with theorem 3 (i) in JH], theorem [8] shows that the 
asymptotic variance of A k (i) is the same as @, to the first 
order in the pass rate of a link. 

Corollary 2: Under the same condition as that in theorem 
[8] the asymptotic variance of lm k (x) is equal to s k + 0(\\ 
a|| 2 ). 

Proof: lm k {x) can be considered a special A k (i) that 
has a two-element sjlm k {x) and a four-element C E . Using 
(28)he same procedure as that in the proof of theorem [8] we have 
the corollary. ■ 



(29) A. Example 



Each of the four components is a matrix, where x,y G D(i) 
and j,j' G D(l). To calculate vA k {i) and derive the first 
order of v E in a, let de(i) denote the denominator of A k (i) 
and no(i) denote the nominator. In addition, let no(i,j) C 
no(i),j G d k be the terms of no{i) that have jj. Then, we 
have 



we use an example to illustrate that composite estimators 
has an asymptotic variance that is very close to that obtained 
by (@). The setting used by the example is identical to the one 
presented in J8), where v k has three descendants with a pass 
rate of a and the pass rate from vq to v k is also equal to a. 
Three estimators are considered, which are A k (2), A k (3) and 
©. Note that 



vA k {i) 



({- 



1 



A k (i) 

i — 1 v L de(i) 



■ j e D(i)}, 



A k (3) 



no(i,j) 
7j • no(i) 



717273 
4({1,2,3}) 



1/2 



:je £>(!)}) 



The number of terms in the denominator and the nominator 

of Afc(i) is the same, i.e. |cfe(i)| = \no(i)\ and \no(i,j)\ = 
(l^ 1 ) and var{ lx ) = lx (l - lx ) - s k + t x + 0(\\a\\ 2 ). 
Inserting the values and those obtained in Lemma [3] into 
VA k (i) and C E , we have 



is the same as the explicit estimator proposed in JS]. On the 
other hand, 



Afc(2) 



7i 72 + 7i 73 + 7273 



J fc ({l,2}) | J fc ({l,3» 



4 ({2, 3}) 



1 



■({ 



-1 



i- 1 \D(i)\ 



3 e £>(<)}, 



{- 



|D(<) 



i G £>(!)}) + 0(|| a ||) 



where 



and 



v E (A k (2)) = VA k (2)C E VA k (2) 



(31) 



(32) 



and 



D{2) = {x\#(x)=2,xeX k }. 

We then have C E that is the 2 * \d k \ dimensional square 
covariance matrix in the form of i 



s k 



{t xAy :x,yeD(i)} {a 3 :jeD{\)} 
{a h jeD{l)} Diag{aj : j G D(l)} 

+0(\\a\\ 2 ) 



(3oy{i,3} 



Let 7{i,2} 

4({1,3}) 



4({l,2}) 



7{2,3} 



4({2,3}) 



and 



n n 
Based on the above setting we have 



x 



7{ij} = a 3 ,i,j S {1,2,3} and 7i = a 2 , 1 G {1,2,3}. Then, 
the top-left of C E is 

v2 



a 3 (l-a 3 ) a 6 ( 

1 a ^ «[' 



1) a 6 (- 



1 



1) 



a b (— -1) a 3 (l-a 3 ) a b (— - 1) 



°L l 

\a 6 (^j-l) a 6 (- 

the top-right is 
/ 



0!*° 

1) a 3 (l-a 3 ) 



1 



a 3 (l-a 2 ) a 3 (l-a 2 ) a b (-% - - 



a 3 (l-a 2 ) a 6 (- 

\ ar a 
the bottom-left is 

/a 3 (l-a 2 ) 

a 3 (l-a 2 ) 
«,1 ^ 



-) a 3 (l-a 2 ) 



a 3 (l- 


« 2 ) 


a 3 (l 


a 3 (l- 


a 2 ) 


a 5 (-- 
a 

a 3 (l- 


a 


-1) 


a 3 (l- 


a 2 ) 


a 3 (l- 



and the bottom-right is 

(n 2 (l-a 2 ) 
„,1 



\a 4 (--l) a 4 (- 



-1) 


a 4 (- 
1 


-1)) 


-a 2 ) 


-1) 


a 




-1) 


a 2 (l- 





In addition, we have 



-1 2 



VA fe (2) - ( ^ , , 3ft2 , ^ , ^ , ^ ). 

Using d32l . we have 

v^{A k {2)) = a-\a 2 + 0{\\af). 

Comparing this with the results presented in ||8] for Ak(3) and 
Ah, where 

t2 



v*(A k (3))=a- 



a 
T 



•0(||a| 



and 



W f(i fc ) = a-a 2 + 0(||a|| 3 ), 



Afc(2) performs better than Ak(3) in terms of asymptotic 
variance. This shows that an estimator consisting of more 
terms may perform better than the others with less terms. 
Note that the full likelihood estimator is the minimum variance 
unbiased estimator that has been proved in [ 13]. 

VI. Estimator Evaluation 
According to ( fl9] >, the quality of Ak(i) depends on 

n 7 



E 

#(x)=i 



IjEx z i 



J2 x£z k n 



#(*)= 



j ex z 3 



(33) 



which measures the fitness between i — wise co-observations 
and i — wise model although f(i) is not observable. If f(i) 
equals to 1, the data fits to the model perfectly. As stated, 



each type of co-observations can be considered a dimension, 
the estimators presented in (fl6l > measure the fitness in different 
dimensions. Although the fitness in one dimension could not 
ensure the fitness in the others, theorem |6] ensures that Vi, i £ 
{2, .., \dk\}, f(i) 1 as n — > oo. Thus, as n —> oo, all Ak(i)& 
coverage to the true value and there is litter difference between 
them. 

If n < oo, the estimates obtained by some of the estimators 
in ( fTSI l may be better than that obtained by the full likelihood 
estimator. We can use the composite Kullback-Leigh diver- 
gence to find the best estimator for a data set, where Akaike 
information criterion (AIC) 11211 is computed for each of the 
estimators. 

To confirm the above, three rounds of simulations are con- 
ducted in various setting, where five estimators are evaluated, 
i.e. the full likelihood, pairwise likelihood, i.e. Ak(2)), triple- 
wise likelihood, i.e. Afe(3), a pair local, i.e Zmfe(2), and a 
triple local, i.e. imfe(3) are compared against each other and 
the results are presented in a number of tables. The number 
of samples used in the simulations starts from 300 and end at 
9900 in a step of 300. For each sample size, 20 experiments 
are carried out to obtain the means and variance. In the tables, 
we only present a part of the results in the tables, where all 
of the means and variance for the samples varying from 300 
to 3000 are included while two samples, i.e. 4800 and 9900, 
are presented for the other samples that are larger than 3000. 
Table His the results obtained from a tree with 8 descendants, 
where the loss rate of a link is set to 1%. In general, when 
the sample is small, the estimates obtained by all estimators 
are drifted away from the true value that indicates the data 
obtained is not enough. Once the sample size reaches 2000, 
the estimates approach to the true value that shows there is 
enough information to make an accurate estimate. All of the 
estimators achieve the same outcome with the increase of 
samples. However, with the increase of samples, the variance 
reduces slowly although there are a number of exceptions. 
The results also show that if the loss rate is lower, such as 
1%, there is little difference between the estimators in terms 
of the means of the estimated obtained and the variance of the 
estimates. 

To investigate the impact of different loss rates on esti- 
mation, another round experiment is carried out on the same 
network topology, where 6 of the descendant links have their 
loss rates equal to 1% and the other two have 5%. The 
paired local estimator uses the co-observation that consists 
of one from each class to estimate the loss rate, while the 
triple local estimator uses the co-observation consisting of 
two of 1% links and one of 5%. The results are presented 
in Table [TT] Compared with Table HJ there is little difference 
among the full likelihood, Afe(2) and Ak(3) in terms of the 
means and variances obtained by them. In contrast, there is 
a slight difference between the two round simulations for 
the two local estimators, where the variance of the second 
round in most cases is slightly larger than that of the first 
round. This is because of the higher loss rates on some of the 
descendant links that certainly has its impact on the variance 
of the estimate of the root link, in particular if the sample 
size is small. Comparing the two local estimators, one is able 
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Estimators 


Full Likelihood 


Pair Likelihood 


Triple Likelihood 


Single Pair 


Single Triple 


samples 


Mean 


Var 


Mean 


Var 


Mean 


Var 


Mean 


Var 


Mean 


Var 


300 


0.0088 


1.59E-05 


0.0088 


1.59E-05 


0.0088 


1 .64E-05 


0.0087 


1.59E-05 


0.0087 


1.61E-05 


600 


0.0089 


1.12E-05 


0.0089 


1.12E-05 


0.0089 


1.13E-05 


0.0089 


1.10E-05 


0.0088 


1.12E-05 


900 


0.0092 


7.76E-06 


0.0092 


7.82E-06 


0.0091 


7.84E-06 


0.0092 


7.90E-06 


0.0092 


8.15E-06 


1200 


0.0095 


6.13E-06 


0.0095 


6.13E-06 


0.0094 


6.17E-06 


0.0095 


6.16E-06 


0.0095 


5.97E-06 


1500 


0.0096 


4.55E-06 


0.0096 


4.55E-06 


0.0096 


4.80E-06 


0.0096 


4.78E-06 


0.0096 


4.33E-06 


1800 


0.0096 


1.82E-06 


0.0096 


1.81E-06 


0.0096 


1.92E-06 


0.0097 


1.92E-06 


0.0096 


1.90E-06 


2100 


0.0097 


3.14E-06 


0.0097 


3.11E-06 


0.0097 


3.14E-06 


0.0097 


3.02E-06 


0.0097 


3.08E-06 


2400 


0.0100 


1.32E-06 


0.0100 


1.32E-06 


0.0100 


1 .36E-06 


0.0100 


1.29E-06 


0.0099 


1.28E-06 


2700 


0.0100 


1.72E-06 


0.0100 


1.72E-06 


0.0100 


1.74E-06 


0.0100 


1.81E-06 


0.0100 


1.83E-06 


3000 


0.0102 


2.96E-06 


0.0102 


2.97E-06 


0.0102 


3.01E-06 


0.0102 


3.04E-06 


0.0102 


2.95E-06 


4800 


0.0103 


1.74E-06 


0.0103 


1.74E-06 


0.0103 


1.74E-06 


0.0103 


1.75E-06 


0.0103 


1.81E-06 


9900 


0.0099 


8.18E-07 


0.0099 


8.23E-07 


0.0099 


8.20E-07 


0.0099 


8.05E-07 


0.0099 


8.60E-07 



TABLE I 

Simulation Result of a 8-Descendant Tree with Loss Rate=1% 



Estimators 


Full Likelihood 


Pair Likelihood 


Triple Likelihood 


Single Pair 


Single Triple 


samples 


Mean 


Var 


Mean 


Var 


Mean 


Var 


Mean 


Var 


Mean 


Var 


300 


0.0088 


1.59E-05 


0.0089 


1.64E-05 


0.0089 


1.68E-05 


0.0091 


2.36E-05 


0.0088 


1.95E-05 


600 


0.0089 


1.12E-05 


0.0089 


1.14E-05 


0.0089 


U6E-05 


0.0088 


1.46E-05 


0.0089 


1.26E-05 


900 


0.0091 


7.76E-06 


0.0091 


7.80E-06 


0.0091 


7.83E-06 


0.0092 


9.74E-06 


0.0091 


8.67E-06 


1200 


0.0094 


6.13E-06 


0.0094 


6.16E-06 


0.0094 


6.18E-06 


0.0096 


7.09E-06 


0.0095 


6.16E-06 


1500 


0.0096 


4.55E-06 


0.0096 


4.72E-06 


0.0096 


4.81E-06 


0.0097 


4.36E-06 


0.0096 


4.45E-06 


1800 


0.0096 


1.82E-06 


0.0096 


1.90E-06 


0.0096 


1.95E-06 


0.0096 


2.45E-06 


0.0096 


1.97E-06 


2100 


0.0097 


3.14E-06 


0.0097 


3.11E-06 


0.0097 


3.11E-06 


0.0098 


3.39E-06 


0.0097 


3.04E-06 


2400 


0.0099 


1.32E-06 


0.0100 


1.34E-06 


0.0100 


1.35E-06 


0.0101 


1.64E-06 


0.0100 


1.44E-06 


2700 


0.0100 


1.72E-06 


0.0100 


1.69E-06 


0.0100 


1 .67E-06 


0.0101 


2.11E-06 


0.0100 


1.90E-06 


3000 


0.0102 


2.96E-06 


0.0102 


2.93E-06 


0.0102 


2.91E-06 


0.0103 


2.83E-06 


0.0102 


2.87E-06 


4800 


0.0103 


1.74E-06 


0.0104 


1.74E-06 


0.0104 


1.74E-06 


0.0104 


2.06E-06 


0.0104 


2.01E-06 


9900 


0.0099 


8.18E-07 


0.0099 


8.30E-07 


0.0099 


8.36E-07 


0.0099 


9.78E-07 


0.0099 


9.11E-07 



TABLE II 

Simulation Result of a 8-DescendantTree, 6 of the 8 have Loss Rate=1% and the other 2 have Loss Rate=5% 



Estimators 


Full Likelihood 


Pair Likelihood 


Triple Likelihood 


Single Pair 


Single Triple 


samples 


Mean 


Var 


Mean 


Var 


Mean 


Var 


Mean 


Var 


Mean 


Var 


300 


0.0503 


2.15E-04 


0.0504 


2.15E-04 


0.0505 


2.14E-04 


0.0508 


2.18E-04 


0.0505 


2.16E-04 


600 


0.0503 


8.23E-05 


0.0503 


8.21E-05 


0.0503 


8.19E-05 


0.0504 


8.24E-05 


0.0503 


8.27E-05 


900 


0.0511 


5.85E-05 


0.0511 


5.81E-05 


0.05 1 1 


5.79E-05 


0.0512 


5.79E-05 


0.0512 


5.88E-05 


1200 


0.0506 


4.93E-05 


0.0506 


4.97E-05 


0.0507 


4.99E-05 


0.0507 


4.85E-05 


0.0507 


4.93E-05 


1500 


0.0502 


2.24E-05 


0.0502 


2.24E-05 


0.0502 


2.23E-05 


0.0503 


2.33E-05 


0.0502 


2.32E-05 


1800 


0.0500 


3.89E-05 


0.0500 


3.85E-05 


0.0500 


3.83E-05 


0.0501 


3.91E-05 


0.0500 


3.94E-05 


2100 


0.0507 


1.16E-05 


0.0507 


1.19E-05 


0.0507 


1 .20E-05 


0.0507 


1.09E-05 


0.0507 


1.13E-05 


2400 


0.0510 


1.40E-05 


0.0510 


1.43E-05 


0.0510 


1.44E-05 


0.0510 


1.40E-05 


0.0510 


1.43E-05 


2700 


0.0507 


1.31E-05 


0.0507 


1.34E-05 


0.0507 


1 .35E-05 


0.0508 


1.35E-05 


0.0507 


1.34E-05 


3000 


0.0508 


6.65E-06 


0.0508 


6.98E-06 


0.0508 


7.14E-06 


0.0508 


6.79E-06 


0.0508 


6.85E-06 


4800 


0.0498 


1.09E-05 


0.0498 


1.10E-05 


0.0498 


1.10E-05 


0.0498 


1.11E-05 


0.0498 


1.11E-05 


9900 


0.0496 


5.35E-06 


0.0496 


5.38E-06 


0.0497 


5.40E-06 


0.0496 


5.48E-06 


0.0496 


5.48E-06 



TABLE III 

Simulation Result of a 8-Descendant Tree, the loss rate of the root link=5%, 4 of the 8 have Loss Rate=1% and the other 4 have 

Loss Rate=5% 



to notice that the local estimator considering the triple co- 
observation performs a slightly better than that considering the 
paired co-observation in terms of variance. This indicates that 
a local estimator is sensitive to the co-observation selected for 
its estimation and the co-observation involving more members 
can overcome the turbulence created by short term losses in 
some degree and provide accurate estimates. 

To further investigate the impact of loss rates on estimation, 
we conduct another round simulation, where we increase the 
loss rate of the root link from 1% to 5%, set the loss rates 
of four descendent links to 5% and the rest to 1%. The result 
is presented in Table [Till where the local estimators consider 
the observations obtained from the descendants that have 1% 
loss rate. The obvious difference between the result and the 
previous two is the variance that is a magnitude higher. This 



indicates a large variance is expected for the estimates of a 
long path that traverses a number of serially connected links 
since the loss rate of a path is proportional to the number of 
links in the path. To reduce the variance, we need to send 
more probes. 

A. Classification 

The simulation study shows the stability of the explicit 
estimators proposed in this paper, where Ak(2) and Ak(3) 
perform as good as the full likelihood estimator proposed 
in HI in all of the settings. Considering the differences in 
terms of the likelihood used by the estimators proposed so 
far, we can have a classification for them, where the full 
likelihood estimators and the local likelihood ones stand at the 
two extremes. The full likelihood estimator indiscriminately 
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considers all of the co-observations in estimation and uses the 
average of the co-observations to overcome the possible surge 
created by a few individual co-observations. Because of this, 
we consider it a blind estimator. In contrast, a local estimator 
is focused on a specific co-observation in its estimation, 
the accuracy of an estimate then rests on the information 
provided by the co-observation, which can be venerable if the 
sample size is small. We call them the specific estimators. The 
composite likelihood estimators proposed in this paper stay 
in between the two extremes that can consider a type of co- 
observations as Q2] or a few types of the co-observations, or a 
part of the co-observations. If there is no special bias to a type 
of co-observations, the composite likelihood estimators per- 
forms similar to the full likelihood one since there is enough 
redundance in the information provided by co-observations. . 

B. Other Estimators 

Apart from the estimators in (fTol . there are still a large 
number of composite likelihood estimators for loss tomogra- 
phy. For instance, we proposed a number of estimators in [12] 
that divide the descendants of a node into groups and consider 
a group as a virtual descendant of the node. If dk is divided 
into two groups, we can divide Yk into two groups accordingly 
and let k\ and fc 2 denote the two virtual descendants. Then, as 
(fT3l l we can have nfci(l) and 71^2(1) as the confirmed arrivals 
at the two virtual nodes k\ and fc 2 - Then, we have a composite 
likelihood estimator as 



A k 1 A k 1[ A k > 



where % 



rife(l) 



7fci 



and 7fc 2 



(34) 

nfe 2 (l) 



n n n 

Expanding it with the same procedure as that used in the proof 

of theoremQ] one is able to find the estimator in fact is a partial 

pairwise estimator that focuses on the co-observations between 

{x\x € Efci A #(x) = 2} and {y\y e E fc2 A #(y) = 2}. In 

addition, we can selectively pair data with model as that used 

in JT9 | to eliminate a local estimator sharing its member with 

other local estimators. Further, some of the estimators in ([Tol l 

can be combined together as an estimator. In fact, (2) is a 

composite likelihood estimator that takes into account of all 

possible co-observations in E^ and gives the same weight to 

each of them. 

C. Robustness 

As theorem II 12b points out, the estimate obtained by 
is correct if and only if \/x,Ik(x) > 0, x S E^. If some 
of the Ik{x) — 0, a different likelihood equation is needed 
in order to have a correct estimate. [22] classifies the data 
sets into five classes. Besides ©, another four likelihood 
equations are proposed, one for a data set that has not been 
considered previously. However, that is only a small portion 
of the cases that require new estimators. To handle them, we 
need to check the data set provided for estimation, and then 
select an estimator that fits to the data set. However, this issue 
has never been raised previously because 

> there is a lack of other alternatives except full likelihood 
estimations, and 



> there is a lack of knowledge about the correspondence 
between data and model. 

Given the analysis and the estimators presented in this paper, 
checking and selecting become not only possible but also 
feasible. If the data required by A k (i) exist, the estimate 
obtained by the estimator is considered correct. In this regard, 
the estimators proposed in this paper can be viewed as trimmed 
estimators since each of them only requires a part of the 
information in a data set. To rank the robustness of the 
estimators proposed in this paper, we have 

rank(Ak(i)) > rank(Ak(j)), if 2 < i < j < \dk\ 

since if Ak(i) to be invalid, there must have 3x, Ik(x) = 
A #(x) = i. If so, all other A k (j),j e {i + 1, --,d k } must 
be invalid as well since there must have \/y,Ik(y) = 0,x C 
y,y £ Efc. To handle all cases that have 3x,Ik(x) — 0, x e 
Efe A > 2, we can either select an estimator from (fTol 

that is not related to Ik(x) or remove Ik{x) and Yijex lj fr° m 
A k (#(x)). 

D. With Missing Data 

Loss rate estimation with a part of data missing has 
been considered in [23], where an expectation maximization 
procedure is used to approximate an estimate that corresponds 
to the full likelihood. With composite likelihood, this problem 
can be handled differently without modeling the missing data 
process as the method proposed in [24]. We can either 

> select an estimator that is not related to missing data as 
discussed in IVI-C1 or 

* add weight parameters to the likelihood function pro- 
posed in (01, where the likelihood objects involving 
missing data have a weight corresponding to the amount 
of missing for the models of missing at random (MAR) 
and missing completely at random (MCAR). 
Then, an explicit estimator can be obtained by using the same 
method proposed in Section [IV] The weight assigned to a 
likelihood object is inversely proportional to the amount of 
missing data. 

VII. Conclusion 

This paper aims at finding inspirations that can lead to 
simple and accurate estimators for loss tomography. To achieve 
the goal, a well known full likelihood estimator previously 
proposed is analyzed on the basis of the necessity of using 
full likelihood. The results obtained from the analysis show 
there are alternatives to connect data to models instead of 
using full likelihood. Then, the sufficient statistics identified 
previously for the full likelihood model are divided into a 
number of subsets according to a er-algebra. Each of the 
subsets consists of a set of sufficient statistics for a model. 
Linking a subset to the model generating the subset leads to 
an estimator that measures the fitness between the two and can 
be solved explicitly. In light of this, a deep investigation has 
been carried out that leads us to the composite likelihood that 
has drawn considerable attention in recent years to estimate 
unknown parameters relating complicated correlations. Using 



the composite likelihood, a set of composite likelihood func- 
tions are proposed according to the correspondences between 
data and models, and subsequently a set of explicit estimators 
are put forward. In addition, the properties of the estimators 
are investigated and presented in the paper that show the 
accuracy of an estimate is proportional to the number of 
descendants. The explicit estimators turn the headache that 
troubles researchers for many years into an asset. Apart from 
presenting the statistical properties in lemmas and theorems, a 
series of simulations are conducted and the results are reported 
that show the estimators proposed in this paper perform as 
good as the full likelihood estimator. The strategy and method 
used in this paper can be extended to the general topology. 
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